A large vocabulary continuous speech recognition hybrid system for the portuguese language
نویسندگان
چکیده
Due to the enormous development of large vocabulary, speaker-independent continuous speech recognition systems, which occur essentially for the US English language, there is a large demand of this kind of systems for other languages. In this paper we present the work done in the development of a large vocabulary, speaker-independent continuous speech recognition hybrid system for the European Portuguese language. This is a difficult task due to the basic development stage of this technology in the European Portuguese language. The development of a system of this kind for a new language depends on the availability of the appropriate source components, mainly a speech corpus and large amounts of texts. This work became possible due to the development of a new database (BD-PUBLICO), a large vocabulary speech corpus for the European Portuguese language developed by us over the last two years.
منابع مشابه
Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملThe development of a speaker independent continuous speech recognizer for portuguese
The development and evaluation of large vocabulary, speaker-independent continuous speech recognition systems are mainly done for the American English language. In this paper we present the work done to date in the development of an hybrid large vocabulary, speaker-independent continuous speech recognition system for the European Portuguese language. Due to the lack of a large appropriate speec...
متن کاملThe use of syllable segmentation information in continuous speech recognition hybrid systems applied to the Portuguese language
Recent works have showed that the use of syllables as the basic unit in a speech recognition system could be very useful. These works introduced methods exploiting syllable information as a mean to add robustness in ”traditional” systems that use phonemes/phones as the basic unit. Being the Portuguese a highly syllabic language we expected that information from syllables would introduce potenti...
متن کاملTHE DEVELOPMENT OF A SPEAKER INDEPENDENT CONTINUOUSSPEECH RECOGNIZER FOR PORTUGUESEJo
The development and evaluation of large vocabulary , speaker-independent continuous speech recognition systems are mainly done for the American En-glish language. In this paper we present the work done to date in the development of an hybrid large vocabulary, speaker-independent continuous speech recognition system for the European Portuguese language. Due to the lack of a large appropriate spe...
متن کاملSpeech Recognition of Broadcast News for the European Portuguese Language
This paper describes our work on the development of a large vocabulary continuous speech recognition system applied to a Broadcast News task for the European Portuguese language in the scope of the ALERT project. We start by presenting the baseline recogniser AUDIMUS, which was originally developed with a corpus of read newspaper text. This is a hybrid system that uses a combination of phone pr...
متن کامل